SYMMETRIC TYPE IMAGE FILTER PROCESSING APPARATUS 
AND PROGRAM AND METHOD THEREFOR 



BACKGROUND OF THE INVENTION 
The present invention relates to a symmetric type image filter 
processing apparatus and a program and a method for filtering processes 
for image data in the apparatus, at a computer that provides a command 
set called single instruction stream, multiple data stream (SIMD) 
commands for making multi media processes high speed. 
Description of the Related Art 

At an existing general processor or an existing digital signal 
processor (DSP), filtering processes for image data are executed by using 
SIMD commands in which plural data are processed simultaneously by 
one command. For example, when it is assumed that the kernel size of 
an image filter is N x M and the number of pixels in the row direction of 
image data is P, generally, in order to obtain operation result pixels (P 
pieces) of one row of the image data, the operation ofPx2xNxM steps 
is required. That is, P times of the multiplication and addition 
operation by the N x M times multiplication and the M x N times 
addition are required. 

Fig. 1 is a flowchart showing processes for obtaining operation 
result pixels at a conventional symmetric type image filter. As shown 
in Fig. 1, by using the SIMD commands that can simultaneously process 
sequential data of Q pieces at one time, source pixels S of M pixels in the 
column direction and kernel coefficients corresponding to the source 
pixels S of M pixels are multiplied, and these multiplied results are 
added cumulatively. This process is repeated N times by moving one 
pixel in the row direction from the reading start position of the pixel in 
the column direction. As the result of the operation, the operation 
result pixels of one row are obtained. In this, Q > 1 and P > Q. And in 



Fig. 1, the processes by using the SIMD commands are shown in the 
parallelograms. As mentioned above, the operation processes of only (2 
xN x M) x p / Q steps are enough to obtain the operation result pixels of 
one row by using the SIMD commands. That is, when the SIMD 
commands are used, Q times high speed operation can be realized for 
obtaining the operation result pixels, compared with a case that the 
SIMD commands are not used. 

As a technology for processing image data by using asymmetric 
type image filter, for example, Japanese Patent No. 2862388 teaches 
filtering processes in a super high speed image processing system. In 
this patent, processing elements, whose number is the same number of 
pixels in one row or a few pieces being less than the number of the pixels 
in one row, are arranged, and a parallel process is applied every pixel. 
And the number of operation times and the number of transferring times 
at the filtering processes are decreased, with this, the high speed 
processing is realized. 

However, in this technology using the symmetric type image 
filter, it is not described a method how to utilize the SIMD commands 
efficiently. 

In many cases, the kernel coefficients of the image filter have 
symmetry. Therefore, the multiplication and addition operation results 
calculated at the time obtaining operation result pixels at the left side 
can be used for obtaining operation result pixels at the right side. 

However, at the conventional technology, the multiplication 
and addition results calculated at the time obtaining the operation result 
pixels at the left side are not used for obtaining the operation result 
pixels at the right side. And the operation result pixels at the right side 
are calculated by operating the multiplication and addition again. 
Consequently, there is a problem that a further high speed processing by 
utilizing the symmetry of the kernel coefficients cannot be realized. 



SUMMARY OF THE INVENTION 

It is therefore an object of the present invention to provide a 
symmetric type image filter processing apparatus and a program and a 
method for filtering processes for image data at the symmetric type 
image filter processing apparatus, in which SIMD commands are utilized 
efficiently for making the filtering processes high speed at a symmetric 
type image filter composed of symmetric kernel coefficients. 

According to a first aspect of the present invention, for 
achieving the object mentioned above, there is provided a symmetric type 
image filter processing apparatus, which processes image data by a 
symmetric type image filter composed of N x M kernel coefficients (N and 
M are odd numbers being 3 or more integers). The symmetric type 
image filter processing apparatus provides an operating means that 
multiplies kernel coefficients of the right side column or the left side 
column for the center column by column elements of image data 
corresponding to the right side column or the left side column and 
cumulatively adds the multiplied results, a memorizing means that 
memorizes operation results being multiplied and cumulatively added 
results operated at the operating means as intermediate data, and a 
pixel value calculating means that calculates pixel values of the image 
data by cumulatively adding the intermediate data memorizing in the 
memorizing means. 

According to a second aspect of the present invention, in the 
first aspect, the operating means multiplies the kernel coefficients of the 
right side column or the left side column by the column elements of the 
image data corresponding to the right side column or the left side column 
and cumulatively adds the multiplied results, and calculates 
intermediate data in one row of the image data, and the pixel value 
calculating means reads out the intermediate data corresponding to the 



position of each pixel of the image data, and calculates the pixel value by 
cumulatively adding the read out intermediate data. 

According to a third aspect of the present invention, in the first 
or second aspect, the operating means and the pixel value calculating 

5 means execute the operation of the multiplication and the cumulative 
addition by using SIMD commands. 

According to a fourth aspect of the present invention, in the 
first aspect, the number of pixels in one row of the image data is P (P is a 
positive integer), and the operating means multiplies each kernel 

10 coefficient of M pieces in each column of { (N + l) / 2 } columns at the 
right or left side by each pixel of M pieces in the column direction of the 
image data and cumulatively adds the multiplied results, by using SIMD 
commands that are capable of processing data of sequential Q pieces 
simultaneously (Q > 1 and Q is a positive integer satisfying the condition 

15 P > Q), and executes this multiplying and cumulatively adding operation 
P / Q times, and generates the intermediate data in one row of the image 
data. 

According to a fifth aspect of the present invention, there is 
provided a program for making a computer work to execute filter 

20 processing to image data by using a symmetric type image filter 
composed of N x M kernel coefficients (N and M are odd numbers being 3 
or more integers). The program for making a computer work to execute 
filter processing to image data provides an operating step that multiplies 
kernel coefficients of the right side column or the left side column for the 

25 center column by column elements of image data corresponding to the 
right side column or the left side column and cumulatively adds the 
multiplied results, a memorizing step that memorizes operation results 
being multiplied and cumulatively added results operated at the 
operating step as intermediate data, and a pixel value calculating step 

30 that calculates pixel values of the image data by cumulatively adding the 



intermediate data memorized at the memorizing step. 

According to a sixth aspect of the present invention, in the fifth 
aspect, the operating step multiplies the kernel coefficients of the right 
side column or the left side column by the column elements of the image 
data corresponding to the right side column or the left side column and 
cumulatively adds the multiplied results, and calculates intermediate 
data in one row of the image data, and the pixel value calculating step 
reads out the intermediate data corresponding to the position of each 
pixel of the image data, and calculates the pixel value by cumulatively 
adding the read out intermediate data. 

According to a seventh aspect of the present invention, in the 
fifth or sixth aspect, the operating step and the pixel value calculating 
step execute the operation of the multiplication and the cumulative 
addition by using SIMD commands. 

According to an eighth aspect of the present invention, in the 
fifth aspect, the number of pixels in one row of the image data is P (P is a 
positive integer), and the operating step multiplies each kernel 
coefficient of M pieces in each column of { (N + l) / 2 } columns at the 
right or left side by each pixel of M pieces in the column direction of the 
image data and cumulatively adds the multiplied results, by using SIMD 
commands that are capable of processing data of sequential Q pieces 
simultaneously (Q > 1 and Q is a positive integer satisfying the condition 
P > Q), and executes this multiplying and cumulatively adding operation 
P / Q times, and generates the intermediate data in one row of the image 
data. 

According to a ninth aspect of the present invention, there is 
provided a method for processing image data by a symmetric type image 
filter composed of N x M kernel coefficients (N and M are odd numbers 
being 3 or more integers). The method for processing image data 
provides the steps of, multiplying kernel coefficients of the right side 




6 



column or the left side column for the center column by column elements 
of image data corresponding to the right side column or the left side 
column and cumulatively adding the multiplied results as intermediate 
data, memorizing operation results being multiplied and cumulatively 
5 added results, and calculating pixel values of the image data by 
cumulatively adding the intermediate data being memorized. 

According to a tenth aspect of the present invention, in the 
ninth aspect, the intermediate data in one row of the image data are 
calculated by multiplying the kernel coefficients of the right side column 

10 or the left side column by the column elements of the image data 
corresponding to the right side column or the left side column and 
cumulatively adding the multiplied results, and the pixel values are 
calculated by reading out the intermediate data corresponding to the 
position of each pixel of the image data, and by cumulatively adding the 

15 read out intermediate data. 

According to an eleventh aspect of the present invention, in the 
ninth or tenth aspect, the multiplying operation and the cumulatively 
adding operation and the pixel value calculating operation are executed 
by using SIMD commands. 

20 According to a twelfth aspect of the present invention, in the 

ninth aspect, the number of pixels in one row of the image data is P (P is 
a positive integer), and the intermediate data in one row of the image 
data are generated by P / Q times of the multiplying and cumulatively 
adding operation that multiplies each kernel coefficient of M pieces in 

25 each column of { (N + l) / 2 } columns at the right or left side by each 
pixel of M pieces in the column direction of the image data and 
cumulatively adds the multiplied results, by using SIMD commands that 
are capable of processing data of sequential Q pieces simultaneously (Q > 
1 and Q is a positive integer satisfying the condition P > Q). 

30 



BRIEF DESCRIPTION OF THE DRAWINGS 
The objects and features of the present invention will become 
more apparent from the consideration of the following detailed 
description taken in conjunction with the accompanying drawings in 
which: 

Fig. 1 is a flowchart showing processes for obtaining operation 
result pixels at a conventional symmetric type image filter; 

Fig. 2 is a block diagram showing a structure of a symmetric 
type image filter processing apparatus at an embodiment of the present 
invention; 

Fig. 3 is a diagram showing an example of utilizing 
intermediate data at the embodiment of the present invention; 

Fig. 4A is a flowchart showing processes at a row-wise 
intermediate data generating section shown in Fig. 2; 

Fig. 4B is a flowchart showing processes at a row-wise 
intermediate data utilizing section shown in Fig. 2; 

Fig. 5 is a diagram showing a symmetric type image filter 
composed of symmetric kernel coefficients at the embodiment of the 
present invention; 

Fig. 6 is a diagram showing a source image using at an actual 
example at the embodiment of the present invention; and 

Fig. 7 is a diagram showing the reduced rate of SIMD 
command steps at the symmetric type image filter processing apparatus 
at the embodiment of the present invention. 

DESCRIPTION OF THE PREFERRED EMBODIMENT 

Referring now to the drawings, an embodiment of the present 

invention is explained in detail. 

Fig. 2 is a block diagram showing a structure of a symmetric 

type image filter processing apparatus at the embodiment of the present 



invention. Referring to Fig. 2, a method, in which intermediate data 
are reused at the symmetric type image filter processing apparatus, is 
explained. 

As shown in Fig. 2, the symmetric type image filter processing 
apparatus at the embodiment of the present invention provides a 
row-wise intermediate data generating section 1, a row-wise 
intermediate data utilizing section 2, and a memory 3. And the 
row- wise intermediate data generating section 1 and the rowwise 
intermediate data utilizing section 2 are connected to a SIMD register X. 
The rowwise intermediate data generating section 1 generates 
intermediate data (cumulative multiplication and addition intermediate 
results) which are used at the time when operation result pixels (pixel 
values of the operation result pixels) of a source image S are obtained. 
The row-wise intermediate data utilizing section 2 obtains operation 
result pixels of one row in the symmetric type image filter processing 
apparatus by utilizing the intermediate data generated at the row wise 
intermediate data generating section 1. The memory 3 memorizes the 
intermediate data and the operation result pixels in the symmetric type 
image filter processing apparatus. 

In this, it is defined that the number of pixels in one row of a 
source image is P, and it is also defined that a SIMD command can 
simultaneously process sequential data of Q pieces at one time. In this, 
Q > 1 and Q is a positive integer satisfying the condition P > Q. The 
SIMD register X (register for SIMD commands), which is used at the 
time when the SIMD commands are executed, can store data elements of 
Q pieces at the same time. 

The technology for simultaneously processing plural data by 
using the SIMD commands is an existing technology, therefore, the 
detailed explanation is omitted. 

A case, in which operation result pixels are obtained by using a 



symmetric type image filter processing apparatus having symmetric 
kernel coefficients whose kernel size is N x M, is studied. In this case, 
first, the multiplication and addition operation results (intermediate 
data), between M pixels in the column direction of subject image data to 
which the filtering is applied and the kernel coefficients corresponding to 
these M pixels, are obtained. And these multiplication and addition 
operation results (intermediate data) can be used at the time when other 
operation result pixels at the position moved from the subject image data 
by (N + 1) / 2 pixels are obtained. 

Fig. 3 is a diagram showing an example of utilizing the 
intermediate data at the embodiment of the present invention. As 
shown in Fig. 3, when the operation result pixels D (i, j), D (i + 2, j) are 
obtained from the subject image data by using the symmetric type image 
filter processing apparatus having the symmetric kernel coefficients of 
the N x M kernel size, the following multiplication and addition 
operation is executed. In this, N and M are odd numbers and positive 
integers being 3 or more. 
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and 

D (i + 2, j) = S (i + 2, j + 0) x K (0, 0) 
+ S (i + 3, j + 0) x K (1, 0) 
+ S (i + 4, j + 0) x K (2, 0) 



+ S (i + 2, j + l) x K (0, 1) 
+ S (i + 3, j + 1) x K (1, 1) 
+ S (i + 4, j + 1) x K (2, 1) 
+ S (i + 2, j + 2) x K (0, 2) 
+ S (i + 3, j + 2) x K (1, 2) 
+ S (i + 4, j + 2) x K (2, 2) 
In these obtained operation result pixels, common items exist. 
The common items are as follows^ 

S (i + 2, j + 0) x K (2, 0) = S (i + 2, j + 0) x K (0, 0) 
S (i + 2, j + 1) x K (2, 1) = S (i + 2, j + 1) x K (0, 1) 
S (i + 2, j + 2) x K (2, 2) = S (i + 2, j + 2) x K (0, 2) 

In this, K (2, 0) = = K (0, 0), K (2, l) = = K (0, l), and K (2, 2) = 

= K (0, 2). 

At the embodiment of the present invention, when one of the 
operation result pixels was obtained, this operation result of one row is 
made as the intermediate data, and the other of the operation result 
pixels is obtained by utilizing the common items in the obtained 
intermediate data. 

Next, referring to Fig. 2, each section in the symmetric type 
image filter processing apparatus at the embodiment of the present 
invention is explained in more detail. 

The row-wise intermediate data generating section 1 generates 
cumulative multiplication and addition results, which are the added 
results of the multiplication of source pixels of one row by the kernel 
coefficients by using the SIMD commands, as the intermediate data. 
And the rowwise intermediate data generating section 1 stores these 
intermediate data in an intermediate data storing region T of the 
memory 3. 

The row-wise intermediate data utilizing section 2 obtains 
operation result pixels of one row in the symmetric type image filter 



processing apparatus by reading out the intermediate data of N pieces 
storing in the intermediate data storing region T of the memory 3, and 
further by cumulatively adding the intermediate data by using the SIMD 
command. As the filtering processes for the whole image, the processes 
for one row are repeated by the times of the number of the rows. 

The memory 3 provides the intermediate data storing region T, 
to which the intermediate data generated at the row-wise intermediate 
data generating section 1 are stored, and an operation result pixel 
storing region D, to which operation result pixels obtained at the 
row wise intermediate data utilizing section 2 are stored. 

Next, referring to drawings, the processes at each section in 
the symmetric type image filter processing apparatus at the embodiment 
of the present invention is explained. Fig. 4A is a flowchart showing 
processes at the row wise intermediate data generating section 1 shown 
in Fig. 2. And Fig. 4B is a flowchart showing processes at the rowwise 
intermediate data utilizing section 2 shown in Fig. 2. Referring to Figs. 
4A and 4B, the processes are explained. 

At the explanation mentioning below, when the operation 
result pixels are obtained from the source image S, the following 
variables are defined. The variable showing the row number in the 
source image S being processed at present is defined as " j and the 
variable showing the column number in the row number " j " of the 
source image S in which the intermediate data are being obtained is 
defined as " i ". Further, the variable showing the column number in 
the kernel coefficients of M x N pieces, which is used for the processing, 
is defined as " n ", and the variable showing the row number in the 
kernel coefficients of M x N pieces, which is used for the processing in 
the " n " column, is defined as " m 

First, referring to Fig. 4A, the processes at the row-wise 
intermediate data generating section 1 is explained. 
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The rowwise intermediate data generating section 1 initializes 
the variables " i " n " and " m " to " 0 and also initializes all elements 
in the SIMD register X to " 0 " (steps S 11 to S13). 

Next, the rowwise intermediate data generating section 1 
5 multiplies the source pixels S (i, j + m) to S ( i + Q — 1, j + m) of sequential 
Q pieces from the " i " th column to the "i + Q — 1" th column of (j + m) 
row by the kernel coefficients K (n, m) simultaneously, by using once the 
SIMD multiplying command and the SIMD adding command. And the 
row-wise intermediate data generating section 1 simultaneously stores 

10 the results of sequential Q pieces, obtained from this operation, in the 
sequential element positions in the SIMD register X (step S14). 

After this, the rowwise intermediate data generating section 1 
makes the variable " m " increase by one, and compares the increased " m 
" with the M. When the increased " m " is less than the M (No at step 

15 S15), the process of the step S14 is repeated. And when the increased 
" m " became the M (Yes at the step S15), the process goes to the next 
step. That is, the row wise intermediate data generating section 1 
multiplies the source pixels S of the Q columns, storing in the SIMD 
register X, by the kernel coefficients of the same columns respectively, 

20 and cumulatively adds the multiplied results. 

And the rowwise intermediate data generating section 1 
stores the results obtained at the steps S14 and S15 (the intermediate 
data of the sequential Q pieces) in the sequential positions (i, n) to (i + Q 
— 1, n) in the intermediate data storing region T of the memory 3, by 

25 once using the SIMD storing command (step S16). 

By the operation mentioned above, the multiplied and added 
results of the " i " th column of the source pixels S and the " n " th column 
of the kernel coefficients are stored in the intermediate data storing 
region T of the memory 3. 

30 After this, the rowwise intermediate data generating section 1 
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makes the variable " n " increase by one, and when the increased " n " is 
less than the (N + l) / 2 (No at step Si 7), the process returns to the step 
SI 3. And when the " n " became the (N + l) / 2 (Yes at the step SI 7), 
the process goes to the next step. The rowwise intermediate data 
5 generating section 1 makes the " i " increase by the Q, and when the 
increased " i " is less than the P (No at step S18), the process returns to 
the step SI 2. And when the increased " i " became the P (Yes at the 
step S18), the intermediate data generating processes ends (step S18). 

As mentioned above, the rowwise intermediate data 
10 generating section 1 executes the multiplication and addition operation 
of P/Q times, and generates the intermediate data of {P x (N + l) / 2} 
pieces. In this, when the P/Q is indivisible, the remainder is discarded, 
and the operation of P/Q + 1 times is executed. 

Next, referring to Fig. 4B, the processes at the row-wise 
15 intermediate data utilizing section 2 is explained. 

The row-wise intermediate data utilizing section 2 initializes 
the variables " i " and " n " to " 0 ", and also initializes all elements in the 
SIMD register X to " 0 " (steps S 21 and S22). 

The row-wise intermediate data utilizing section 2 compares 
20 the value of the " n " with the value of the (N + l) / 2, for deciding the 
intermediate data to be referred at the time when operation result pixels 
are obtained. When the value of the " n " is the value of the (N + l) / 2 
or more, the value of (N— 1 — n) is made the variable " o ", and when the 
value of the " n " is less than the value of the (N + 1) / 2, the value of " n " 
25 is made the variable " o ". By the processes mentioned above, the 
intermediate data of sequential Q pieces storing in the intermediate data 
storing region T { (i + n, o) to (i + Q- 1 + n, o) } of the memory 3 are 
decided as the intermediate data to be referred. And the row-wise 
intermediate data utilizing section 2 stores the decided intermediate 
30 data in the sequential element positions in the SIMD register X, by once 



using the SIMD adding command (step S23). 

After this, the rowwise intermediate data utilizing section 2 
makes the variable " n " increase by one, and compares the increased " n 
" with the N. When the increased " n " is less than the N (No at step 
S24), the process returns to the step S23, and the step S23 is repeated. 
That is, the row-wise intermediate data utilizing section 2 repeats the 
process at the step 23 N times. And when the increased " n " became 
the N (Yes at the step 24), the process goes to the next step. 

And the row-wise intermediate data utilizing section 2 stores 
the operation result pixels obtained at the steps S23 and S24 in the 
sequential positions ( i, j ) to ( i + Q - 1, j ) in the operation result pixel 
storing region D of the memory 3, by once using the SIMD storing 
command (step S25). 

After this, the row-wise intermediate data utilizing section 2 
makes the " i " increase by the Q, and compares the increased " i " with 
the P. When the increased " i " is less than the P (No at step S26), the 
process returns to the step S22. And when the increased " i " became 
the P, the processes obtaining the operation result pixels end. 

As mentioned above, the filtering processes for the source 
image S are executed, that is, the operation result pixels are obtained. 

Next, the filtering processes for the source image S are 
explained by using an actual example. Fig. 5 is a diagram showing a 
symmetric type image filter composed of symmetric kernel coefficients at 
the embodiment of the present invention. In Fig. 5, as an example, a 
Mexican hat shaped symmetric type image filter in which the kernel size 
is 13 x 13 (N = M = 13) is shown. Fig. 6 is a diagram showing a source 
image using at the actual example at the embodiment of the present 
invention. At the actual example, in Fig. 6, the source pixels at the 
column number " i " is from " 0 " to " 3 " and the row numbers " j " is from 
" 0" to " 12" are used. 



At the explanation mentioning below, P = 256 (the number of 
pixels in the row direction) and Q = 4 (the sequential number of pixels 
that can be simultaneously processed by the SIMD command) are set. 
And in order to focus on the first row of the source image, j = 0 is set. 

First, the row-wise intermediate data generating section 1 
executes the processes mentioned at the steps Sll to S13. That is, the 
row- wise intermediate data generating section 1 initializes the variables 
" i ", " n " and " m " to " 0 ", and also initializes all elements in the SIMD 
register X to " 0 ". 

And the rowwise intermediate data generating section 1 
repeats the processes mentioned at the steps S14 and S15. That is, the 
variables " i " and " n " are fixed to " 0 " ( i = 0 and n = 0), and the value of 
the variable " m " is increased by one each until the variable " m " 
becomes 12, and the cumulative multiplication and addition operation for 
the kernel coefficients and the source pixels is executed. And this 
operation results are stored in the SIMD register X ( 0 ) to X ( 3 ). That 
is, the row-wise intermediate data generating section 1 executes the 
following operation and obtains the intermediate data of Q pieces (four 
pieces). 

X ( 0 ) = 1 x 0 + llx 0 + 21 x 0 + 31 x 0 + 41 x 0 
+ 51 x (-1) + 61 x (-1) + 71 x (-1) 
+ 81 x 0+ 91 x 0+ 101 x 0+111 x 0+ 121 x 0 

X ( 1 ) = 2 x 0 + 12 x 0 + 22 x o + 32 x 0 + 42 x 0 
+ 52 x (-1) + 62 x (-1) + 72 x (-1) 

+ 82 x 0 + 92 x 0 + 102 x 0 + 112 x 0 + 122 x 0 
X(2) = 3x 0 + 13 x 0 + 23 x o + 33 x 0 + 43 x 0 

+ 53x(-i)+ 63 x (-1) + 73 x (-1) 

+ 83 x o + 93 x 0 + 103 x 0 + 113 x 0 + 123 x 0 
X(3) = 4x 0 + 14 x 0 + 24 x 0 + 34 x 0 + 44 x 0 

+ 54 x (-1) + 64 x (-1) + 74 x (-1) 



+ 84 x0+ 94 x0 + 104 x 0 + 114 x 0 + 124 x 0 
Further, the row wise intermediate data generating section 1 
executes the process mentioned at the step S16. That is, the row-wise 
intermediate data generating section 1 stores the four intermediate data 
obtained at the processes mentioned above in the intermediate data 
storing region T (i, j) to (i + Q — 1, j) in the memory 3 by once using the 
SIMD storing command. That is, the row-wise intermediate data 
generating section 1 stores the intermediate data as follows: 

T ( 0, 0 ) «- X ( 0 ) 

T ( 1, 0) — X ( 1 ) 

T ( 2, 0 ) «- X ( 2 ) 

T ( 3, 0 ) «- X ( 3 ) 

Next, the row-wise intermediate data generating section 1 
executes the judgment mentioned at the step Si 7, after increasing the 
variable " n " by one. That is, the value of the increased " n " is 
compared with the value of the (N + l) / 2. Since the " n " < the (N + l) / 
2 {n = 1 and (N + 1) / 2 = (13 +1) / 2 = 7}, the process returns to the step 
Si 3. And the value of the variable " m " is increased by one each until 
the variable " m " becomes 12 at the rowwise intermediate data 
generating section 1, and the cumulative multiplication and addition 
operation for the kernel coefficients and the source pixels is executed. 
And the operation results are stored in the SIMD register X ( 0 ) to X ( 3 ). 
That is, the row-wise intermediate data generating section 1 executes the 
following operation and obtains the intermediate data of Q pieces (four 
pieces). 

X ( 0) = 1 x 0 + 11 x 0 + 21 x 0 + 31 x (-l) + 41x (-1) 
+ 51 x (-2) + 61 x (-2) + 71 x (-2) 
+ 81 x (- 1) + 91 x (- 1) + 101 x 0 + 111 x 0 + 121 x 0 

X ( 1 ) = 2 x 0 + 12 x 0 + 22 x 0 + 32 x (- 1) + 42 x (-1) 
+ 52 x (-2) + 62 x (-2) + 72 x (-2) 



+ 82 x (-1)+ 92 x (-1)+ 102 x 0 + 112 x o + 122 x 0 
X(2) = 3 x 0+ 13 x 0+ 23 x 0+ 33 x (-l) + 43x (-1) 
+ 53 x (-2) + 63 x (-2) + 73 x (-2) 
+ 83 x (- 1) + 93 x (- 1) + 103 x o + 113 x 0 + 123 x 0 
X( 3) = 4 x 0+ 14 x 0+ 24 x 0 + 34 x (— l) + 44x (-1) 
+ 54 x (-2) + 64 x (-2) + 74 x (-2) 
+ 84 x (- 1) + 94 x (- 1) + 104 x 0 + 114 x 0 + 124 x 0 
Further, the rowwise intermediate data generating section 1 
executes the process mentioned at the step S16 after the step Si 5. That 
is, the row-wise intermediate data generating section 1 stores the four 
intermediate data X ( 0 ) to X ( 3 ) obtained at the processes mentioned 
above in the intermediate data storing region T (i, j) to (i + Q — 1, j) of the 
memory 3 by once using the SIMD storing command. The row-wise 
intermediate data generating section 1 stores the intermediate data as 
follows: 

T ( 0, 1 ) «- X ( 0 ) 
T( 1, 1) <- X( 1) 
T(2, 1) «- X(2) 
T ( 3, 1 ) — X ( 3 ) 

By executing the same operation mentioned above, the 
row-wise intermediate data generating section 1 makes the value of the 
variable " m " increase 1 to 12 by one each, every time when the value of 
the variable " n " is increased to 2 to 6. And the cumulative 
multiplication and addition operation for the kernel coefficients and the 
source pixels is executed. And the operation results are stored in the 
SIMD register X. And the row- wise intermediate data generating 
section 1 stores the intermediate data of four pieces in the intermediate 
data storing region T of the memory 3 by once using the SIMD storing 
command. 

After continuing the processes mentioned above, when the 
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rowwise intermediate data generating section 1 finished the process at 
the step S17 and went to the process at the step S18, a total of 28 pieces 
of the intermediate data are stored in the intermediate data storing 
region T ( 0, 0 ) to T ( 3, 6) of the memory 3. 
5 And the rowwise intermediate data generating section 1 

executes the process at the step S18. That is, the value of Q (Q = 4) is 
added to the variable " i " (i = 0), and the increased value of " i " is 
compared with the value of P (P = 256), and when the increased " i " is 
less than the value of P, the process returns to the step S12. And the 
10 processes from the step S12 to the step S18 are repeated. And when the 
increased " i " became the value of P, the intermediate data generating 
processes end. 

When the intermediate data generating processes by the 
row-wise intermediate data generating section 1 ended, the intermediate 
15 data of 1792 pieces (256 x 7) are stored in the intermediate data storing 
region T (0, 0) to T (256, 6) of the memory 3. 

After this, the row-wise intermediate data utilizing section 2 
executes the processes at the steps S21 and S22 mentioned above. That 
is, the row-wise intermediate data utilizing section 2 initializes the 
20 variables " i " and " n " to " 0 ", and also initializes all elements in the 
SIMD register X to " 0 ". 

And the row-wise intermediate data utilizing section 2 
executes the processes at the steps S23 and S24 mentioned above, and 
stores the intermediate data storing in the intermediate data storing 
25 region T of the memory 3 in the SIMD register X. That is, the row-wise 
intermediate data utilizing section 2 compares the value of " n " with the 
value of {(N + l) / 2} every time when the value " n " is increased by one 
each until the value " n " becomes from 0 to 12. 

When the value " n " ^ the value (N + l) / 2, the variable " o " 
30 is made the value (N- 1 — n). And when the value " n " < the value (N 
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+ l) / 2, the variable " o " is made the value of " n ". The intermediate 
data in the intermediate data storing region T { (i + n, o) to (i + Q — 1 + n, 
o) } of the memory 3, shown by these variables " o ", are stored in the 
SIMD register X ( 0 ) to X ( 3 ). That is, in the SIMD register X ( 0 ) to X 
( 3 ), the following cumulatively added values are stored. 
X ( 0 ) = T ( 0, 0 ) + T ( 1, 1 ) + T-( 2, 2 ) + T ( 3, 3 ) 
+ T(4, 4)+T(5, 5)+T(6, 6) + T(7, 5) 
+ T ( 8, 4 ) + T ( 9, 3 ) + T ( 10, 2 ) + T ( 11, l) + T ( 12, 0 ) 
X ( 1 ) = T ( 1, 0 ) + T ( 2, 1 ) + T ( 3, 2 ) + T ( 4, 3) 
+ T ( 5, 4 ) + T ( 6, 5 ) + T ( 7, 6 ) + T ( 8, 5) 
+ T ( 9, 4 ) + T ( 10, 3 ) + T ( 1 1, 2 ) + T ( 12, 1) + T ( 13, 0 ) 
X ( 2 ) = T ( 2, 0 ) + T ( 3, 1 ) + T ( 4, 2 ) + T ( 5, 3) 
+ T(6, 4 ) + T ( 7, 5 ) + T ( 8, 6 ) + T ( 9, 5) 
+ T ( 10, 4 ) + T ( 11, 3 ) + T ( 12, 2 ) + T ( 13, l) + T ( 14, 

0) 

X ( 3 ) = T ( 3, 0 ) + T ( 4, 1 ) + T ( 5, 2 ) + T ( 6, 3 ) 
+ T( 7, 4) + T (8, 5 ) + T(9, 6) + T ( 10, 5) 
+ T ( 11, 4 ) + T ( 12, 3 ) + T ( 13, 2 ) + T ( 14, 1) + T ( 15, 

0) 

The cumulatively added values obtained by the processes 
mentioned above become operation result pixels at the positions " n "is 
from 0 to 3 of the 13 x 13 Mexican hat shaped image filter. 

Further, the rowwise intermediate data utilizing section 2 
executes the process at the step S25, and stores the operation result 
pixels storing in the SIMD register X in the operation result pixel storing 
region D ( i, j ) to D ( i + Q— 1, j ) of the memory 3. That is, the row- wise 
intermediate data utilizing section 2 stores the operation result pixels in 
the operation result pixel storing region D ( 0, 0 ) to D ( 3, 0 ) of the 
memory 3 as follows: 

D ( 0, 0 ) «- X ( 0 ) 
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D ( 1, 0) — X ( 1 ) 
D(2, 0) «- X(2) 
D ( 3, 0 ) - X ( 3 ) 

The row-wise intermediate data utilizing section 2 repeats the 
5 processes from the step S22 to S25 P / Q times (64 = 256 / 4), by 
increasing the variable " i " adding the value Q until the increased " i " 
becomes the value P. Consequently, the operation result pixels of one 
row are stored in the operation result pixel storing region D ( 0, 0 ) to ( 
256, 0 ) of the memory 3. As the processes mentioned above, the 

10 operation result pixels of one row of the source image S are obtained. 

As mentioned above, in the symmetric type image filter 
processing apparatus at the embodiment of the present invention, the 
number of the necessary SIMD command steps for obtaining the 
operation result pixels of one row is{2xMx(N+l)/2 + N}xp/Q 

15 steps. At the conventional technology, the number of the necessary 
SIMD command steps is 2 x N x M x P / Q. Therefore, the difference 
between the present invention and the conventional technology is { N x 
M — (N + M)}xp/Q steps, and the number of steps is reduced largely 
at the present invention. 

20 Consequently, at the case of the symmetric type image filter 

processing apparatus in which the M is equal to the N, the larger the 
kernel size ( N = M > 3 ) is, the larger the difference becomes. Fig. 7 is 
a diagram showing the reduced rate of the SIMD command steps at the 
symmetric type image filter processing apparatus at the embodiment of 

25 the present invention. As shown in Fig. 7, compared with the 
conventional technology, about 50 % of the SIMD command steps can be 
reduced at the largest at the embodiment of the present invention. 
That is, when the filter processing is executed by using a symmetric type 
image filter processing apparatus in which the kernel size is large (the N 

30 and M are large), the high speed processing can be realized by the 
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present invention. 

A part or all of the processes at the row wise intermediate data 
generating section 1 and the row- wise intermediate data utilizing section 
2 can be executed by a program controlled by a CPU or a MPU. 

As mentioned above, according to the embodiment of the 
present invention, the necessary SIMD command steps for obtaining the 
operation result pixels of one row at the symmetric type image filter 
processing apparatus can be reduced largely. Therefore, the high speed 
filtering for the image data can be realized. 

While the present invention has been described with reference 
to the particular illustrative embodiment, it is not to be restricted by that 
embodiment but only by the appended claims. It is to be appreciated 
that those skilled in the art can change or modify the embodiment 
without departing from the scope and spirit of the present invention. 



